{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Homework 2\n",
    "\n",
    "**Due: 02/13/2020**\n",
    "\n",
    "\n",
    "## References\n",
    "\n",
    "+ Lectures 7-10 (inclusive).\n",
    "\n",
    "## Instructions\n",
    "\n",
    "+ Type your name and email in the \"Student details\" section below.\n",
    "+ Develop the code and generate the figures you need to solve the problems using this notebook.\n",
    "+ For the answers that require a mathematical proof or derivation you can either:\n",
    "    \n",
    "    - Type the answer using the built-in latex capabilities. In this case, simply export the notebook as a pdf and upload it on gradescope; or\n",
    "    - You can print the notebook (after you are done with all the code), write your answers by hand, scan, turn your response to a single pdf, and upload on gradescope.\n",
    "\n",
    "\n",
    "**Note**: Please match all the pages corresponding to each of the questions when you submit on gradescope. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Student details\n",
    "\n",
    "+ **First Name:**\n",
    "+ **Last Name:**\n",
    "+ **Email:**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here are some modules that you may need - please run this block of code:\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "import seaborn as sns\n",
    "sns.set_context('talk')\n",
    "import numpy as np\n",
    "import scipy\n",
    "import scipy.stats as st"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 1\n",
    "### Part A\n",
    "Let $X$ be a continuous random variable with CDF:\n",
    "$$\n",
    "F(x) = p(X \\le x).\n",
    "$$\n",
    "Show that the random variable\n",
    "$$\n",
    "Z = F(X)\n",
    "$$\n",
    "    is distributed uniformly in $[0,1]$.\n",
    "    Hint: Show that:\n",
    "$$\n",
    "    F_Z(z) := p(Z \\le z) = z.\n",
    "$$\n",
    "**Proof:**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Type your proof here. Delete that ``<br>`` line (it just makes some white space).*\n",
    "<br><br><br><br><br><br><br><br><br><br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part B"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The theorem you have just proved is very useful when you want to test if a set of observations $x_1,x_2,\\dots,x_N$ has been indeed independent realizations of a random variable $X$ with CDF $F(x)$.\n",
    "Part A proves that, if this hypothesis is valid, then the transformed dataset $z_1,z_2,\\dots,z_N$, where\n",
    "$$\n",
    "z_i = F(x_i),\n",
    "$$\n",
    "should be distributed uniformely in $[0,1]$.\n",
    "In other words, the empirical histrogram of $z_1,z_2,\\dots,z_N$ should be a flat line.\n",
    "Use this observation to find the distribution from which this dataset was sampled:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = np.loadtxt('hw2_p1b_data.txt')\n",
    "fig, ax = plt.subplots()\n",
    "ax.hist(data, density=True, alpha=0.5)\n",
    "ax.set_xlabel('$x$')\n",
    "ax.set_ylabel('Empirical PDF')\n",
    "ax.set_title('Histogram of data')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The correct distribution is one of the following:\n",
    "\n",
    "1. Standard normal, $\\mathcal{N}(0,1)$;\n",
    "\n",
    "2. Normal with mean 2 and variance 2, $\\mathcal{N}(2,2)$; or\n",
    "\n",
    "3. Exponential with rate parameter $1$, $\\mathcal{E}(1)$; or\n",
    "\n",
    "4. Exponential with rate parameter $2$, $\\mathcal{E}(2)$; or\n",
    "\n",
    "5. Exponential with rate parameter $10$, $\\mathcal{E}(10)$; or\n",
    "\n",
    "6. Gamma distribution with parameters $\\alpha=2.$ and $\\beta=3.$.\n",
    "\n",
    "Systematically go over these distributions and try to determine which one generated the data.\n",
    "All the required CDF's and inverse CDF's are implemented in [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html).\n",
    "Check also [scipy.stats.rv_continuous](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html#scipy.stats.rv_continuous).)\n",
    "Please pay special attention to the defintiion of the probability distributions of the various random variables and how you can control their parameters.\n",
    "As a hint, here is how you can test for $\\mathcal{N}(0,1)$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Testing for N(0,1):\n",
    "transformed_data = st.norm(loc=0, scale=1).cdf(data) # cdf() gives the CDF of a random variable\n",
    "# If the data came from N(0,1), the histogram of the transformed_data should match that of a uniform:\n",
    "fig, ax = plt.subplots()\n",
    "ax.hist(transformed_data, density=True, alpha=0.5)\n",
    "ax.plot(np.linspace(0, 1,50), np.ones(50), lw=2, color='r') # This is the line that you should try to much.\n",
    "ax.set_xlabel('$z = F(x)$')\n",
    "ax.set_ylabel('Empirical PDF')\n",
    "ax.set_title('Testing the $\\mathcal{N}(0,1)$');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 2\n",
    "\n",
    "Consider the following data set $x_1,\\dots,x_N$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = np.loadtxt('hw2_p2_data.txt')\n",
    "data = np.array(data)\n",
    "fig, ax = plt.subplots()\n",
    "ax.hist(data, alpha=0.5)\n",
    "ax.set_xlabel('$x$')\n",
    "ax.set_ylabel('Empirical PDF')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your goal is to generate a procedure that samples from the same distribution as the observed data. \n",
    "This is a variation of the standard *density estimation* problem.\n",
    "In general, this is a very difficult problem and we will see various ways to solve it later on.\n",
    "In this problem, you will develop a simple method that relies on the empirical CDF of the observed data.\n",
    "Needless to say, this method works only for one dimensional cases in which you have a lot of observations.\n",
    "\n",
    "The [empirical CDF](https://en.wikipedia.org/wiki/Empirical_distribution_function) of our data set, $x_1,\\dots,x_N$ is defined to be:\n",
    "$$\n",
    "\\hat{F}_N(x) = \\frac{\\text{Number of observations}\\;\\le x}{N} = \\frac{1}{N}\\sum_{i=1}^N 1_{[x_i,+\\infty]}(x),\n",
    "$$\n",
    "where $1_A(x)$ is the [indicator function](https://en.wikipedia.org/wiki/Indicator_function) of the set $A$.\n",
    "Using the, so called, [strong law of large numbers](https://en.wikipedia.org/wiki/Law_of_large_numbers#Strong_law), we can show that $\\hat{F}_N(x)$ converges to the true CDF of the data as $N\\rightarrow+\\infty$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part A\n",
    "\n",
    "Complete the code that calculates the empirical CDF:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "def myECDF_base(x):\n",
    "    \"\"\"\n",
    "    Make this code work if ``x`` is a simple scalar.\n",
    "    \n",
    "    :param x:    The point at which you want to observe the PDF.\n",
    "    :returns:    The value of the empirical CDF at ``x``.\n",
    "    \"\"\"\n",
    "    N = data.shape[0]\n",
    "    # Write your code here (delete the next line and return the right value)\n",
    "    raise NotImplementedError('Implement me!')\n",
    "\n",
    "# Vectorize your function (i.e., make it work with 1D numpy arrays).\n",
    "# See this: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.vectorize.html\n",
    "myECDF = np.vectorize(myECDF_base)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You can test your results by comparing the empirical CDF you can get with of scipy.stats\n",
    "# The two should match almost exactly\n",
    "hist_rv = st.rv_histogram(np.histogram(data, bins=1000))\n",
    "fig, ax = plt.subplots()\n",
    "# The range in which the x's takes values:\n",
    "x_min = data.min()\n",
    "x_max = data.max()\n",
    "xx = np.linspace(x_min, x_max, 100)\n",
    "ax.plot(xx, myECDF(xx), label='My empirical CDF')\n",
    "ax.plot(xx, hist_rv.cdf(xx), '--', label='Empirical CDF from scipy.stats')\n",
    "ax.set_xlabel('$x$')\n",
    "ax.set_ylabel('Empirical CDF')\n",
    "plt.legend(loc='best')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part B\n",
    "\n",
    "Now complete the code that computes the inverse of the empirical CDF $\\hat{F}^{-1}$.\n",
    "There are may ways of doing this.\n",
    "Let's do it in a way that will teach us something about the root finding toolbox of numpy (see [this]()).\n",
    "Mathematically, we wish to find a function $F^{-1}$ such that\n",
    "$$\n",
    "F(F^{-1}(u))) = u,\n",
    "$$\n",
    "for any $u\\in[0,1]$ (the domain in which $F(x)$ takes values).\n",
    "It is obvious that $F^{-1}(u)$ is the solution to the *root finding* problem:\n",
    "$$\n",
    "F(x^*) = u.\n",
    "$$\n",
    "Since we know that $F$ is increasing, this problem must have a unique solution for any $u\\in[0,1]$.\n",
    "To find this solution, we can use [Brent's method](https://en.wikipedia.org/wiki/Brent%27s_method).\n",
    "Please note, that the problem that this code solves is of the form:\n",
    "$$\n",
    "g(x^*) = 0.\n",
    "$$\n",
    "So, you will have to reformulate the original problem as:\n",
    "$$\n",
    "F(x^*) - u = 0.\n",
    "$$\n",
    "Study the [numpy implementation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brentq.html#scipy.optimize.brentq) of Brent's method and complete the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy import optimize    # Gives you access to optimize.brentq\n",
    "\n",
    "def myiECDF_base(u):\n",
    "    \"\"\"\n",
    "    Evaluates the inverse of the empirical CDF.\n",
    "    \n",
    "    :param u:   A scalar at which to evaluate the function.\n",
    "    :returns:    The value of the inverse of the empirical CDF at ``u``.\n",
    "    \"\"\"\n",
    "    # Write your code here\n",
    "    # You will have to define a functioin to pass to optimize.brentq\n",
    "    # You can define this function in here\n",
    "    # (delete the next line and return the right value)\n",
    "    raise NotImplementedError('Implement me!')\n",
    "\n",
    "# Vectorize your function (i.e., make it work with 1D numpy arrays).\n",
    "# See this: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.vectorize.html\n",
    "myiECDF = np.vectorize(myiECDF_base)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# You can test your results by comparing the inverse of the empirical CDF you can get with scipy.stats\n",
    "# The two should match almost exactly\n",
    "hist_rv = st.rv_histogram(np.histogram(data, bins=1000))\n",
    "fig, ax = plt.subplots()\n",
    "uu = np.linspace(1e-3, 1, 100)    # For convergence issues we cannot start at 0\n",
    "ax.plot(uu, myiECDF(uu), label='Inverse of my empirical CDF')\n",
    "ax.plot(uu, hist_rv.ppf(uu), '--', label='Inverse of empirical CDF from scipy.stats')\n",
    "ax.set_xlabel('$x$')\n",
    "ax.set_ylabel('Inverse of empirical CDF')\n",
    "plt.legend(loc='best')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part C\n",
    "\n",
    "Now use the *inverse transform sampling* method to generate samples from same distribution as the original data.\n",
    "That is, you can now generate uniform samples:\n",
    "$$\n",
    "u_i \\sim U([0,1]),\n",
    "$$\n",
    "and transform them as:\n",
    "$$\n",
    "\\hat{x}_i = {\\hat{F}}^{-1}(u_i).\n",
    "$$\n",
    "The $\\hat{x}_i$'s generated in this way should have the same distribution of the data you started with.\n",
    "Verify this by comparing the histrogram of $1,000$ $\\hat{x}_i$ samples with the original data of this problem."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write your code here.\n",
    "# Feel free to copy paste code from above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem 3\n",
    "\n",
    "This is a classic uncertainty propagation problem that you will have to solve using Monte Carlo sampling.\n",
    "Consider the following stochastic harmonic oscillator:\n",
    "$$\n",
    "\\begin{array}{ccc}\n",
    "\\ddot{y} + 2 \\zeta \\omega(X) \\dot{y} + \\omega^2(X)y &=& 0,\\\\\n",
    "y(0) &=& y_0(X),\\\\\n",
    "\\dot{y}(0) &=& v_0(X),\n",
    "\\end{array}\n",
    "$$\n",
    "where:\n",
    "+ $X = (X_1, X_2, X_3)$,\n",
    "+ $X_i \\sim N(0, 1)$,\n",
    "+ $\\omega(X) = 2\\pi + X_1$, \n",
    "+ $\\zeta = 0.01$,\n",
    "+ $y_0(X) = 1+ 0.1 X_2$, and\n",
    "+ $v_0 = 0.1 X_3$.\n",
    "\n",
    "In words, this stochastic harmonic oscillator has an uncertain natural frequency and uncertain initial conditions.\n",
    "\n",
    "Our goal is to propagate uncertainty through this dynamical system, i.e., estimate the mean and variance of its solution.\n",
    "A solver for this dynamical system is given below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Solver(object):\n",
    "    def __init__(self, nt=100, T=5):\n",
    "        \"\"\"\n",
    "        This is the initializer of the class.\n",
    "        \n",
    "        Arguments:\n",
    "            nt - The number of timesteps.\n",
    "            T  - The final time.\n",
    "        \"\"\"\n",
    "        self.nt = nt\n",
    "        self.T = T\n",
    "        self.t = np.linspace(0, T, nt) # The timesteps on which we will get the solution\n",
    "        # The following are not essential, but they are convenient\n",
    "        self.num_input = 3             # The number of inputs the class accepts\n",
    "        self.num_output = nt           # The number of outputs the class returns\n",
    "        \n",
    "    def __call__(self, x):\n",
    "        \"\"\"\n",
    "        This special class method emulates a function call.\n",
    "        \n",
    "        Arguments:\n",
    "            x - A 1D numpy array with 3 elements. This represents the stochastic input x = (x1, x2, x3).\n",
    "        \"\"\"\n",
    "        ##uncertain quantities \n",
    "        x1 = x[0]\n",
    "        x2 = x[1]\n",
    "        x3 = x[2]\n",
    "        \n",
    "        #ODE parameters \n",
    "        omega = 2*np.pi + x1 \n",
    "        y10 = 1 + 0.1*x2\n",
    "        y20 = 0.1*x3\n",
    "        y0 = np.array([y10, y20])   #initial conditions \n",
    "        \n",
    "        #coefficient matrix \n",
    "        zeta = 0.01\n",
    "        k = omega**2    ##spring constant\n",
    "        c = 2*zeta*omega   ##damping coeff. \n",
    "        C = np.array([[0, 1],[-k, -c]])\n",
    "        \n",
    "        #RHS of the ODE system\n",
    "        def rhs(y, t):\n",
    "            return np.dot(C, y)\n",
    "        \n",
    "        y = scipy.integrate.odeint(rhs, y0, self.t)\n",
    "        \n",
    "        return y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's plot a few samples of the forward model to demonstrate how the solver works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1. Create the solver object\n",
    "solver = Solver()\n",
    "fig1, ax1 = plt.subplots()\n",
    "ax1.set_xlabel('$t$ (Time)')\n",
    "ax1.set_ylabel('$y(t)$ (Position)')\n",
    "fig2, ax2 = plt.subplots()\n",
    "ax2.set_xlabel('$t$ (Time)')\n",
    "ax2.set_ylabel('$\\dot{y}(t)$ (Velocity)')\n",
    "for i in range(2):\n",
    "    # Sample the random inputs (they are just standard normal)\n",
    "    x = np.random.randn(solver.num_input) # solver.num_input is just 3\n",
    "    # Evaluate the solver response\n",
    "    y = solver(x) # This returns an (num timesteps) x (num states) array (100 x 2 here)\n",
    "    # Plot the sample\n",
    "    ax1.plot(solver.t, y[:, 0])\n",
    "    ax2.plot(solver.t, y[:, 1], label='Sample {0:d}'.format(i+1))\n",
    "plt.legend(loc=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For your convenience, here is code that takes many samples of the solver at once:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def take_samples_from_solver(num_samples):\n",
    "    \"\"\"\n",
    "    Takes ``num_samples`` from the ODE solver.\n",
    "    \n",
    "    Returns them in an array of the form: ``num_samples x 100 x 2`` (100 timesteps, 2 states (position, velocity))\n",
    "    \"\"\"\n",
    "    samples = np.ndarray((num_samples, 100, 2))\n",
    "    for i in range(num_samples):\n",
    "        samples[i, :, :] = solver(np.random.randn(solver.num_input))\n",
    "    return samples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It works like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "samples = take_samples_from_solver(50)\n",
    "print(samples.shape)\n",
    "print(samples)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part A\n",
    "Take 100 samples of the solver output and plot the estimated mean position and velocity as a function of time along with a 95\\% epistemic uncertainty interval around it. \n",
    "This interval captures how sure you are about the mean response when using only 100 Monte Carlo samples.\n",
    "You need to use the central limit theorem to find it (see the lecture notes)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "samples = take_samples_from_solver(100)\n",
    "# Sampled positions are: samples[:, :, 0]\n",
    "# Sampled velocities are: samples[:, :, 1]\n",
    "# Sampled position at the 10th timestep is: samples[:, 9, 0]\n",
    "# etc.\n",
    "\n",
    "# Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part B\n",
    "\n",
    "Plot the epistemic uncertainty about the mean response at $t=5$s as a function of the number of samples. \n",
    "\n",
    "**Solution**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part C\n",
    "Repeat part A and B for the squared response.\n",
    "That is, do exactly the same thing as above, but consider $y^2(t)$ and $\\dot{y}^2(t)$ instead of $y(t)$ and $\\dot{y}(t)$.\n",
    "How many samples do you need to estimate the mean squared response at $t=5$s with negligible epistemic uncertainty?\n",
    "\n",
    "**Solution**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part D\n",
    "\n",
    "Now that you know how many samples you need to estimate the mean of the response and the square response, use the formula:\n",
    "$$\n",
    "\\mathbb{V}[y(t)] = \\mathbb{E}[y^2(t)] - \\left(\\mathbb{E}[y(t)]\\right)^2,\n",
    "$$\n",
    "and similarly for $\\dot{y}(t)$, to estimate the variance of the position and the velocity with negligible epistemic uncertainty.\n",
    "Plot both quantities as a function of time.\n",
    "\n",
    "**Solution**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Part E\n",
    "\n",
    "Put together the estimated mean and variance to plot a 95\\% predictive interval for the position and the velocity as functions of time.\n",
    "\n",
    "**Solution**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-End-"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  },
  "latex_envs": {
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 0
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
